R as a programming language

no developer would mistake R for a programming language

Simon Urbanek, R Core developer, R Journal 13(2), 22-24, 10.32614/RJ-2021-111


S has a simple goal: To turn ideas into software, quickly and faithfully

John Chambers, Programming with data: A guide to the S language, ISBN 978-0-387-98503-9.

R is an Algorithm Interface

R is a descendant of S, which was designed as an algorithm interface

A hand-drawn sketch initialled JMC, titled Algorithm Interface, dated 5/5/76. In the middle, a circle labelled ABC, in the margin defined as a general (FORTRAN) algorithm. Around the circle, a rectangle, labelled XABC, in the margin defined as a Fortran subroutine to provide an interface between ABC and language and/or utility programs

Initial design sketch for “The System” by John Chambers, lead developer of S
From: Interfaces, Efficiency and Big Data, useR! 2014

When might we consider using R for research software?

  • Prototyping
  • To provide a user-friendly interface
    • R interface for data analytic scripting
    • R-based dashboard/web application for non-coders
  • As a glue language
    • Interfaces with other languages
    • Creating a full pipeline from raw data to data product

Prototyping

R is a high-level language

R comes with functions for common tasks in scientific computing, e.g,

Linear algebra

M %*% N
svd(M)

Random number generation and sampling

rnorm(100, mean = 0, sd = 1)
sample(1:100, size = 10, replace = FALSE)

Optimization

optimize(function(x) {(x - 1/3)^2}, 
         interval = c(0, 1))

R is a high-level language (ctd.)

R comes with functions for common tasks in data analytics, e.g,

Reading data

study_data <- read.csv("study_data.csv", header = TRUE)

Data manipulation

sort(study_data$variable1, decreasing = TRUE)
as.numeric(sub("ID", "", study_data$id))

Statistics and data visualisation

lm(response ~ variable1 + variable2, data = study_data)
kmeans(study_data[c("variable1", "variable2")], centers = 2)
plot(response ~ variable1, data = study_data)

R still has building blocks for programming

  • Basic data types
    • logical
    • numeric (integer, double, complex)
    • character
  • Control flow
    • if/else and switch
    • for and while loops
  • Operators
    • mathematical
    • logical

R facilitates operating at a higher level

x <- c(1, 5, 7)
log(x)
[1] 0.000000 1.609438 1.945910
M <- matrix(c(1, 0, 1, 0), nrow = 2, ncol = 2)
M == 1
      [,1]  [,2]
[1,]  TRUE  TRUE
[2,] FALSE FALSE
date_series <- seq.Date(from = as.Date("2025-01-01"), 
                        to = as.Date("2025-01-10"), by = "day")
date_series - date_series[1]
Time differences in days
 [1] 0 1 2 3 4 5 6 7 8 9
list_data <- list(x = 1:3, M = matrix(1:4))
lapply(list_data, mean)
$x
[1] 2

$M
[1] 2.5
summary(trees)
     Girth           Height       Volume     
 Min.   : 8.30   Min.   :63   Min.   :10.20  
 1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
 Median :12.90   Median :76   Median :24.20  
 Mean   :13.25   Mean   :76   Mean   :30.17  
 3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
 Max.   :20.60   Max.   :87   Max.   :77.00  

Access more functions via R packages

R packages provide additional data structures and functions, e.g.

install.packages("xml2", quiet = TRUE)
library(xml2)
exchange_rates <- read_xml("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml")
class(exchange_rates)
[1] "xml_document" "xml_node"    

Tip

We can use namespacing to access package functions directly, e.g. xml2::read_xml()

The default repository is CRAN, which has >20,000 packages.

R is an interpreted language

We can easily explore interactively when developing a function

Speed up with Compiled Code

A disadvantage of high-level, interpreted languages is they can be slow.

Since R is designed as an algorithm interface, we can speed up bottlenecks with compiled code.

Two R packages make this easier for R programmers:

  • Rcpp
  • quickr

Speeding up with Rcpp

Suppose we have an R function to compute the Manhattan distance between two points

\[d = \sum^n_i | a_i - b_i| \]

ManhattanR.R
ManhattanR <- function(a, b){
    sum(abs(a - b))
}
ManhattanC.cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double ManhattanC(NumericVector a, NumericVector b) {
    return sum(abs(a - b));
}

Speeding up with Rcpp (ctd.)

Rcpp::sourceCpp() creates an R wrapper around the C++ code

Rcpp::sourceCpp("ManhattanC.cpp")
a <- c(1, 0)
ManhattanC(a, a + c(1, 7))
[1] 8

Speeding up with quickr

The quickr::quick() function transpiles R code to Fortran

ManhattanR.R
ManhattanR <- function(a, b){
    sum(abs(a - b))
}
library(quickr)
ManhattanF <- quick(function(a, b){
    declare(type(a = double(NA)),
            type(b = double(NA)))
    out <- sum(abs(a - b))
    out
})
a <- c(1, 0)
ManhattanF(a, a + c(1, 7))
[1] 8

User-friendly interfaces

R Packages

The best practice for sharing R functions with other R users is to create an R package.

An R package minimally includes

  • DESCRIPTION defining authors, version, license
  • NAMESPACE defining imports and exports
  • /R directory of R scripts with function definitions

Creating an R Package

An R package can be created from scratch or from an existing directory

Exporting Functions

User functions should be documented and exported

groupSort.R
#' Order a Variable by a Grouping Variable
#'
#' Order the elements of a vector by the values of a grouping variable.
#'
#' @param x the variable to order
#' @param g the grouping variable
#' @param decreasing logical: if `TRUE`, order by decreasing values of `g`
#'
#' @returns A variable with the values of `x` sorted by the values of `g`.
#' @export
#' @examples
#' x <- 11:19
#' g <- rep(1:3, length.out = 9)
#' groupSort(x, g)
groupSort <- function(x, g, decreasing = FALSE) {

Generate the documentation with devtools::document()

Sharing an R package

If we put the package in a git repos, those with access can install from R, e.g.

remotes::install_github("hturner/demopkg")
library(demopkg)
groupSort(1:11, rep(1:3, length.out = 9))
[1] 1 4 7 2 5 8 3 6 9

Tip

Credentials for private git repos can be set with gitcreds::set()

Sharing R package binaries

Create an r-developer.r-universe.dev repo on GitHub and add

packages.json
[
    {
        "package": "demopkg",
        "url": "https://github.com/r-developer/demopkg"
    }
]

Install the r-universe app on GitHub. Then users can install a binary version with

install.packages("demopkg", repos = c(
  `R developer` = 'https://r-developer.r-universe.dev',
  CRAN = 'https://cloud.r-project.org'))

Tip

For private packages use devtools::build(), plus win-builder.r-project.org and/or mac.r-project.org/macbuilder if required.

Keeping an interactive workflow

We can keep an interactive workflow with devtools::load_all()

Designing an interface for non-coders

We can quickly design a Shiny UI with designer::designApp()

Writing a Shiny App

We can use the copied code to start creating a Shiny app

app.R
library(shiny)

ui <-

Writing a Shiny App

app.R
library(shiny)

ui <- bootstrapPage(
  title = "Shiny Application",
  theme = bslib::bs_theme(4),
  h1(
    "Group Sort"
  ),
  fileInput(
    inputId = "file",
    label = "Choose file"
  ),
  inputPanel(
    textInput(
      inputId = "x",
      label = "label"
    ),
    textInput(
      inputId = "g",
      label = "label"
    ),
    radioButtons(
      inputId = "order",
      label = "",
      choices = c("Increasing", "Decreasing")
    )
  ),
  DT::DTOutput(
    outputId = "table"
  )
)

Writing a Shiny App

app.R
)

server <- function(input, output, session) {

}

Writing a Shiny App

app.R
)

server <- function(input, output, session) {
    # Reactive value to store the uploaded data
    uploaded_data <- reactive({
        req(input$file)
        data <- read.csv(input$file$datapath, header = TRUE)
        return(data)
    })
}

Writing a Shiny App

app.R
    })

    # Reactive value for processed data
    processed_data <- reactive({
        req(uploaded_data())

        data <- uploaded_data()
        if (all(c(input$x, input$g) %in% names(data))){
            out <- vector(mode = "list")
            out[[input$x]] <- demopkg::groupSort(data[[input$x]],
                                                 data[[input$g]],
                                                 input$order == "Decreasing")
            return(as.data.frame(out))
        }
    })

Writing a Shiny App

app.R
    })

    # Render the data table
    output$table <- DT::renderDT({
        req(processed_data())

        DT::datatable(processed_data())
    })
}

Tip

Could pass ui to Shiny assistant https://gallery.shinyapps.io/assistant to get an initial draft for server

Writing a Shiny App

app.R
    })

    # Render the data table
    output$table <- DT::renderDT({
        req(processed_data())

        DT::datatable(processed_data())
    })
}

# Create the Shiny app
shinyApp(ui = ui, server = server)

Run Shiny App Locally

A Shiny app can be run locally with shiny::runApp("dir/containing/app.R")

Deploying a Shiny App

Many options, the following are relatively simple and free

Route R Packages Source Hosting
Posit Cloud Connect CRAN, Bioconductor, Public GitHub Cloud server
shinylive CRAN, Bioconductor, R-universe Self-hosted static HTML, GitHub Pages
shinyapps.io CRAN, Bioconductor, Public/Private GitHub Cloud server

Other options are paid (premium versions of shinyapps.io, Posit Cloud Connect) or more work (self-hosted ShinyProxy).

A Recipe for Shiny App in R Package in Private GitHub

  1. Sign up for shinyapps.io.

  2. Go to Account > Tokens > Show and copy code to run in R

    • Allow access to private GitHub repos via Account > Profile > Update Authentification
  3. Save app.R in inst/shiny of your R package.

  4. Commit and push.

  5. Install the package from GitHub

    remotes::install_github("hturner/demopkg")
  6. Deploy, using sha of git commit in URL

    rsconnect::deployApp("inst/shiny",
                         appName = packageDescription("demopkg")$RemoteSha)

Share by Obscure URL

Try it out!

https://supershiny.shinyapps.io/e09f21d4cec5370c2afc656b0c06a6cd399e207d/

R as a glue language

Interfacing with standard C++

Suppose we already had a C++ function to compute the Manhattan distance

\[d = \sum^n_i | a_i - b_i| \]

#include <vector>
#include <cmath>

double Manhattan(const std::vector<double>& a, const std::vector<double>& b) {
    int n = a.size();
    double out = 0.0;  

    for (int i = 0; i < n; ++i) {
        out += std::abs(a[i] - b[i]);
    }

    return out;
}

Interfacing with standard C++ using Rcpp

Within the package source directory run usethis::use_rcpp("Manhattan")

Interfacing with standard C++ using Rcpp (ctd.)

As directed, create demopkg-package.R

demopkg-package.R
## usethis namespace: start
#' @importFrom Rcpp sourceCpp
#' @useDynLib demopkg, .registration = TRUE
## usethis namespace: end
NULL

Interfacing with standard C++ using Rcpp (ctd.)

Now add the existing C++ code to Manhattan.cpp

Manhattan.cpp
#include <Rcpp.h>
using namespace Rcpp;

#include <vector>
#include <cmath>

double Manhattan(const std::vector<double>& a, const std::vector<double>& b) {
    int n = a.size();
    double out = 0.0;

    for (int i = 0; i < n; ++i) {
        out += std::abs(a[i] - b[i]);
    }

    return out;
}

Interfacing with standard C++ using Rcpp (ctd.)

Add // [[Rcpp::export]] so that devtools::document() creates an R wrapper

Manhattan.cpp
#include <vector>
#include <cmath>
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double Manhattan(const std::vector<double>& a, const std::vector<double>& b) {
    int n = a.size();
    double out = 0.0;

    for (int i = 0; i < n; ++i) {
        out += std::abs(a[i] - b[i]);
    }
RcppExports.R
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

Manhattan <- function(a, b) {
    .Call(`_demopkg_Manhattan`, a, b)
}

Interfacing with standard C++ using Rcpp (ctd.)

To export the R function, document in the .cpp file

Manhattan.cpp
using namespace Rcpp;

//' Order a Variable by a Grouping Variable
//'
//' Order the elements of a vector by the values of a grouping variable.
//'
//' @param x the variable to order
//' @param g the grouping variable
//' @param decreasing logical: if `TRUE`, order by decreasing values of `g`
//'
//' @returns A variable with the values of `x` sorted by the values of `g`.
//' @export
//' @examples
//' x <- 11:19
//' g <- rep(1:3, length.out = 9)
//' groupSort(x, g)
// [[Rcpp::export]]
double Manhattan(const std::vector<double>& a, const std::vector<double>& b) {

Interfacing with other languages

R can interface with

R for Data Gathering and Reporting

R can handle data from the start of a research study…

R for Analysis and Reporting

… to the production of research outputs

  • Analysis
    • CRAN Task Views cover fields of statistics (e.g. Bayesian statistics, Spatial Statistics, Survival Analysis) and domains (Clinical trials, Finance, Official statistics)
    • Bioconductor repository for analysing omics data
  • Reporting
    • Quarto for creating documents, dashboards, websites based on R, Python, Julia and ObservableJS
    • Shiny for dashboards, interactive web/mobile apps.

Data pipelines/workflows

Orchestrating all the steps of a workflow in R supports reproducible research.

A data-scientist might do this via

An RSE might contribute via

  • An R package to perform an specific step in the analysis
  • A Shiny app to implement a standard workflow from start to finish

Ecosystem and Community

Ecosystem

  • Large ecosystem of packages for data science to build on and integrate with
    • Strengths: Bioinformatics, Spatial Statistics, Epidemiology, Time Series, Survey Analysis, Statistical Modelling and Data visualisation
    • Weaknesses: ML, Deep Learning, Scientific Computing
  • Supportive of software development in research context

Community

Summary

  • R is good for
    • Fast development
    • Creating user-friendly code/no code interfaces
    • Interfacing with other languages
    • Reproducible data-analytic workflows
  • Intended user community, area of data science and domain of application can affect whether it is the best choice